It's said that a program tends to spend 90% of its time in 10% of its codebase. Even amongst that 10%, developers will hyperfixate on a small fraction, usually one or two functions/loops. This is worthwhile and useful, but you shouldn't leave smaller gains elsewhere. Those small gains can snowball into bigger ones given enough of them; however, while intuitively true, how do we reason about that? How many regressions does it take until we reach our breaking point? How much performance do we need to gain, per patch, to reach an agreed upon target?
By reframing the problem we can tackle some of those questions. Framed in terms of a, slightly modified, annual compound interest formula we get this:
Let's make sense of this. Our principal is the baseline of whatever metric we wish to optimize. Let's say, execution time, memory usage, etc. For simplicity's sake, let's make it one so we can ignore it for now. \(\Delta\) is the resultant change to our baseline, \(p\) is the number of patches, and \( d\,\% \) is the target percentage change per patch (as measured by your benchmarking suite of choice).
With some trivial algebra we can rearrange our above equation and answer the questions posed earlier:
Note the scale difference between the X-axis and Y-axis.
Using this, developers can answer a couple of useful questions:
There are some important caveats to this modeling of the problem, namely:
\(p\)Patch Count | \(d\)+0.05%per patch | \(d\)-0.05%per patch |
---|---|---|
0 | 1.0 | 1.0 |
1 | 1.050 | 0.950 |
2 | 1.102 | 0.902 |
3 | 1.157 | 0.857 |
5 | 1.276 | 0.773 |
14 | 1.979 | 0.487 |
50 | 11.467 | 0.076 |
100 | 131.501 | 0.005 |
In all, despite the caveats there are advantages to this approach. This method can provide a practical framework for managing performance plans more effectively. It helps set clear, achievable goals or boundaries that teams can use to navigate optimizating a codebase. Additionally, this approach supports developers focused on performance, offering them a structured way to justify enhancements or provide concrete guardrails on development trajectories.
Circling back to the start, don't hyperfixate on a single function or loop when optimizing your program's hot path. Avoid immediately writing off a change whose improvement may be small as it can yield big gains overtime. Further, don't rebut suggestions to improve regressions as "premature optimization" because you might have the team pay for it later.
⠀